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S ystem and App aratus for Speech Communication 
anri Speech recognition 



5 Field of the invention 



The present invention relates to a system and apparatus for speech 
communication and speech recognition. It further relates to signal processing 
methods which can be implemented in the system. 



Background nf the Invention 

The present applicant's PCT application PCT/SG99/001 19, the disclosure of 
which is incorporated herein by reference in its entirety, proposes a method of 
15 processing signals in which signals received from an array of sensors are 
subject to a first adaptive filter arranged to enhance a target signal, followed 
by a second adaptive fitter arranged to suppress unwanted signals, t he 
output of the second filter is converted into the frequency domain, and further 
digital processing is performed in that domain. 

20 

The present invention seeks to provide a headset system performing 
improved signal processing of audio signals and suitable for speech 
communication. 

25 The present invention further seeks to provide signal processing methods and 
apparatus suitable for use in a speech communication and/or speech 

recognition system. 
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In general terms, a first aspect of the present invention proposes a headset 
system including a base unit and a headset unit to be worn by a user (e.g. 
resting on the user's head or around the user's shoulders) and having a 
olurality of microphones, the headset unit and base unit being in mutual 
wireless communication, and at least one of the base unit and the headset 
unit having digital signal processing means arranged to perform s,gna[ 
processing in the time domain on audio signals generated by the 
microphones, the signal processing means including at least one adaptive 
filter to enhance a wanted signal in the audio signals and at least one 
adaptive filter to reduce an unwanted signal in the audio signals. 

Preferably the digital signal processing means are part of the headset unit 

The headset can be used for communication with the base unit, and optionally 
with other individuals, especially via the base unit. The headset system may 
comprise, or be in communication with, a speech recognition eng.ne for 
recognizing speech of the user wearing the headset unit. 

Althouoh the signal processing may be as described in PCT/SG99/00119, 
more preferably, the signal processing is modified to distinguish between the 
noise and interference signals. Signals received from the microphones (array 
of sensors) are processed using a first adaptive filter to enhance a target 
signal and then divided and supplied to a second adaptive filter arranged to 
reduce interference signals and a third filter arranged to reduce nose. The 
outputs of the second and third filters are combined, and may be subject to 
further processing in the frequency domain. 
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In fact, this concept provides a second, independent aspect of the invention 
which is a method of processing signals received from an array of sensors 
comprising the steps of sampling and digitising the received signals and 
processing the digitally converted signals, the processing including: 
5 filtering the digital signals using a first adaptive filter arranged to 

enhance a target signal in the digital signals, 

transmitting the output of the first adaptive filter to a second adaptive 
filter and to a third adaptive filter, the second filter being arranged to suppress 
unwanted interference signals, and the third filter being arranged to suppress 

10 noise signals; and 

combining the outputs of the second and third filters. • 

The invention further provides signal processing apparatus for performing 
such a method. 



15 
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Brief Description of the Drawings 



An embodiment of the invention will now be described by way of example with 
reference to the accompanying drawings in which: 



Fig.1 illustrates a general scenario in which an embodiment of the invention 
may operate. 

Fig.2 is a schematic illustration of a general digital signal processing system 
which is an embodiment of present invention. 
25 Fig.3 is a system level block diagram of the described embodiment of Fig.2. 
Fig.4a-d is a flow chart illustrating the operation of the embodiment of Fig.3. 
Fig.5 illustrates a typical plot of non-linear energy of a channel and the 
established thresholds. 

Fig.6 (a) illustrates a wave front arriving from 40. degree ofr-boresight 
30 direction. 
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Fig.6 (b) represents a time delay estimator using an adaptive filter. 

Fig.6 (c) shows the impulse response of the filter indicates a wave front from 

the boresight direction. 

Fig.7 shows the response of time delay estimator of the filter indicates an 
5 interference signal together with a wave front from the boresight direction. 
Fig.8 shows the schematic block diagram of the four channels Adaptive 
Spatial Filter. 

Fig. 9 is a response curve of S-shape transfer function (S function). 
Fig.10 shows the schematic block diagram of the Adaptive Interference Filter. 
1 0 Fig.1 1 shows the schematic block diagram of the Adaptive Ambient Noise 
Estimator. 

Fig.12 is a block diagram of Adaptive Signal Multiplexer. 
Fig.1 3 shows an input signal buffer. 

Fig.14 shows the use of a Manning Window on overlapping blocks of signals. 
1 5 Fig.1 5 illustrates a sudden rise of noise level of the nonlinear energy plot. 
Fig. 16 illustrates a specific embodiment of the invention schematically. 
Fig. 17 illustrates a headset unit which is a component of the embodiment of 
_Fig. 16. . 

Fig. 18, which is composed, of Figs. 18(a) and 18(b) t shows two ways of 
20 . wearing the headset unit of Fig. 17. 

Detailed Description of the Embodime nt of the Invention 

Below, with reference to Figs. 16 and 17, we describe a specific embodiment 
25 of the invention. Before that, we describe in detail a digital signal processing 
technique which may be employed by the invention. 

FIG. 1 illustrates schematically the operating environment of a signal 
. processing apparatus 5 of the described embodiment of the invention, shown 
30 in a simplified example of a room. A target sound signal "s" emitted from a 
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source s' in a known direction impinging on a sensor array, such as ar 
microphone array 10 of the apparatus 5, is coupled with other unwanted 
signals namely interference signals u1, u2 from other sources A, B, reflections . 
of these signals u1r, u2r and the target signal's own reflected signal sr. These 

5 unwanted signals cause interference and degrade the quality of the target 
signal "s" as received by the sensor array. The actual number of unwanted 
signals depends on the number of sources and room geometry but only three 
reflected (echo) paths and three direct paths are illustrated for simplicity of 
explanation. The sensor array 10 is connected to processing circuitry 20-60 

10 and there will be a noise input q associated with the circuitry which further 
degrades the target signal. 

An embodiment of signal processing apparatus 5 is shown in FIG.2. The 
apparatus observes the environment with an array of four sensors such as 

16 microphones 10a-10d. Target and noiseflnterference sound signals are 
coupled when impinging on each of the sensors. The signal received by each 
of the sensors is amplified by an amplifier 20a-d and converted to a d,g,tal 
bitstream using an analogue to digital converter 30a-d. The bit streams are 
feed in parallel to the digital signal processor 40 to be processed digitally. The 

20 processor provides an output signal to a digital to analogue converter SO 
which is fed to a line amplifier 60 to provide the final analogue output. 

FIG 3 shows the major functional blocks of the digital processor in more 
detail The multiple input coupled signals are received by the four-channel 

25 microphone array 10a-10d, each of which forms a signal channel, w,th 
channel 10a being the reference channel. The received signals are passed to 
a receiver front end which provides the functions of amplifiers 20 and 
analooue to digital converters 30 in a single custom chip. The four channel 
digitized output signals are fed in parallel to the digital signal processor 40. 

30 The digital sicnal'processor 40 comprises five sub-processors. They are (a) a 
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Preliminary Signal Parameters Estimator and Decision Processor 42, (b) a 
Signal Adaptive Filter 44,' (c) an Adaptive Interference Filter 46, (d) an 
Adaptive Noise Estimation Filter 48, and (e) an Adaptive Interference and 
Noise Cancellation and Suppression Processor. 50. The basic signal flow is 
5 from processor 42, to processor 44, to processor 46 and 48, to processor 50. 
The output of processor 42 is referred to as "stage 1" in this process; the 
output of processor 44 as "stage 2", and the output of processors 46, 48 as 
"stage 3". These connections being represented by thick arrows in FIG.3. The 
filtered signal S is output from processor 50. Decisions necessary for the 
10' operation of the processor 40 are generally made by processor 42 which 
receives information from processors 44-50, makes decisions on the basis of 
that information and sends instructions to processors 44-50, through 
connections represented by thin arrows in FIG.3. The outputs I, S of the 
processor 40 are transmitted to a Speech recognition engine, 52. 

15 

It will be appreciated that the splitting of the processor 40 into the five 
component parts "42, 44, 46, 48 and 50 is essentially notional and is made to 
assist understanding of the operation of the processor. The processor 40 
would in reality be embodied as a single multi-function digital processor 
20 performing the functions described under control of a program with suitable 
memory and other peripherals. Furthermore, the operation of the speech 
recognition engine 52 also could in principle be incorporated into the 
operation of the processor 40. 

25 A flowchart illustrating the operation of the processors is shown in FIG 4a-d 
and this will firstly be described generally. A more detailed explanation of 
aspects of the processor operation will then follow. 

The front end 20,30 processes samples of the signals received from array 10 
30 at a predetermined sampling frequency, for example 15kHz. The processor 42 
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includes an input buffer 43 that can hold N such samples for each of the four 
channels. Upon initialization, the apparatus collects a block of N/2 new signal 
samples for all the channels at step 500, so that the buffer holds a block of 
N/2 new samples and a block of N/2 previous samples. The processor 42 
5 then removes any DC from the new samples and pre-emphasizes or whitens 
the samples at step 502. ' 

Following this, the total non-iinear energy of a stage 1 signal sample E r i and a 
stage 2 signal sample E ;3 is calculated at step- 504. The samples from the 
10 reference channel 10a are used for this purpose although any other channel 
could be used. 

There then follows a short initialization period at step 506 in which the first 20 
.blocks of N/2 samples of signal after start-up are used to. estimate a Bark 

15 Scale system noise B n at step 515 and a histogram Pb at step 518. During 
this short period, an assumption is made that no target signals are present. 
The updated Pb is then used with updated Pbs to estimate the environment 
noise energy E n and two detection thresholds, a noise threshold T n i and a 
larger signal threshold T n 2, are calculated by processor 42 from E n using 

20 scaling factors. The routine then moves to point B and point F. 

After this initialization period, Pbs and B n are updated when an update 
condition is fulfilled. 

25 " At step 508, it is determined if the stage 3 signal energy E r3 is greater thanjhe 
noise threshold T n i. If not, the Bark Scale system noise B n is updated at step 
510. Then, it'll proceed to step 512. if so, the routine will skip step 510 and 
proceed to step 512: A test is made at step 512 to see if the signal energy E r1 
is greater than the noise threshold T n i. If so, Pb and Pbs are estimated at step ^ 

30 518 for computing E ni T n i and T n2 . The routine then moves to point B and 
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point F If not, only Pbs will be updated and it's used with previous Pb to 
compute E n . T n i and T r , at step 514. Tn1 and Tn2 will follow the environment 
noise level closely. The histogram is used to determine if the signal energy 
level shows a steady state increase which would indicate an increase in 
5 noise since the speech target signal will show considerable variation over, 
time and thus can be distinguished. This is illustrated in FIG.15 in which a 
signal noise level rises from an initial level to a new level which exceeds both 
thresholds. 

10 A test is made at step 520 .to see if the estimated energy E* in the reference 
channel 10a exceeds the second threshold T n2 . If so, a counter C L is reset 
and a candidate target signal is deemed to be present. The apparatus only 
wish-s to process candidate target signals that impinge on the array 10 rrom 
a known direction normal to the array, hereinafter referred to as the boresight 

15 direction, or from a limited angular departure there from, in this embedment 
plus or minus 1 5 degrees. Therefore, the next stage is to check for any signal 
arriving from this direction. 

At step 528 three coefficients are" established, namely a correlation coefficient 
20 C x a correlation time delay T d and a filter coefficient peak ratio P k which 
together provide an indication of the direction from which the target s.gnal 
arrived. 

At step 530, three tests are conducted to determine if the candidate target 
25 signal is an actual target signal. First, the cross correlation coefficient C, must 
- exceed a predetermined threshold T* second, the size of the delay coefficent 
must be less than a value 9 indicating that the signal has impinged on the 
array within the predetermined angular range and lastly the filter coefficient 
peak ratio P k must exceed a predetermined thresholds. If these conditions 
30 are not met, the signal is not regarded as a target signal and the routme 
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passes to step 534 (non-target signal filtering). If the conditions are met, ft*, 
confirmed target signal is fed to step 532 (target signal filtering) of S.gnal 
Adaptive Spatial Filter 44. 

„ at step 520, the estimated energy 6, in the reference channel 10a is found • 
not to exceed the second threshold T*. the target signal is considered not to 
be present and the routine passes to step 534 via steps 522-526 in which the 
counter C L is incremented. At step 524, C is checked against a threshold T CL . 
If the threshold is reached, block leak compensation is performed on the „lter 
, coefficient W B and counter C L is reset at step 526. This block leak 
compensation step improves the adaptation speed of the Alter coefficient W M 
to the direction of fast changing target sources and environment If the 
threshCd is not reached, the program moves to step 534 described below. 

5 Foilowino step 530, the confirmed target signal is fed to step 532 atthe Signal 
Adaptive Spatial Fitter 44. The filter % instructed to perform adaptive filtenng 
at st=p 532 and 536, in which the fitter coefficients W„ are adapted to prov.de 
a -target signal plus noise' signal in the reference channel and "noise only- 
signals in the remaining channels using the Leas. Mean Square (IMS) 

>0 algorithm. In order to prevent the filter' coefficient updated wrongly, a runn,ng 
energy ratio R* is computed at every sample a. step 532. This running energy 
ratl0 R M is used as a condition to test whether that the filter coetaen 
corresponding to that particular sample should be updated or not. The .tor 44 
output channel equivalent to the reference channel is for convenience retened 

25 to as the Sum Channel and the filter 44 output from the other channels, 
Difference Channels. The signal so processed will be, for convenience, 
referred to as A 1 . 

If the signal is considered to be a noise signal, the routine passes to step 534 
30 in which the signals are passed through filter 44 without the filter coefficents 
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being adapted, to form the Sum and Difference channel signals. The signals 
so processed will be referred to for convenience as B'. 

The effect of the filter 44 is to enhance the signal if this is identified as a target 
5 signal but not otherwise. 

At step 538, a new filter coefficient peak ratio. Pk2 is calculated based on the 
filter coefficient W su .. At step 539, if the signal is not A' signals from step 532 
the routine passes to step 548. Else, the peak ratio calculated at step 538 is 

10 compared with a best peak ratio BP k at step 540. if it is larger than best peak 
ratio, the value of best peak ratio is replaced by this new peak ratio P k2 and ail 
the filter coefficients W su are stored as the best filter coefficients at step 542. If 
it is not, the peak ratio is again compared with a threshold T Pk at step 544. 
If the peak ratio is below the threshold, a wrong update on the filter 

1 5 coefficients is deemed to be occurred and the filter coefficients are restored to 
the previous stored best filter coefficients at step 546. If it is above the 
threshold, the routine passes to step 548. 

At step 548, an energy ratio R sd and power ratio P red between the Sum 
20 Channel and the Difference Channels are estimated by processor 42. Besides 
these, two other coefficients are also established, namely an energy ratio 
factor Rsdf and a second stage non-linear signal energy E r2 . Following this, the 
adaptive noise power threshold T Prsd is updated based on the calculated 
power ratio Prsd. 

25 

At this point, the signal is divided into two parallel paths namely point C and 
point D. Following point C, the signal is subject to a further test at step 552 to 
determine if the noise or interference presence. First, if the signals are A' 
signals from step 532, the routine passes to step 556. Second^ if the 
30 esVmated energy E r2 is found not to exceed the second threshold i n2 , the 
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signal is considered not to be present and the routine passes to step 556. 
Third, the filter coefficient peak ratio Pja is compared to a threshold T Pk2 . If it is 
higher than threshold, this may indicate that there is a target signal and 
routine passes to step 556. Lastiy, the R S(J and Prsd are compared to threshold 

5 T rsd and T Prsd respectively. If the ratios are both lower than threshold, this 
indicates probable noise but if higher, this may indicate that there has been 
some leakage of the target signal into the Difference channel, indication the 
presence of a target signal after all. For such target signals, the routine also 
passes to step 555. For all other non-target signals, the routine passes to step 

10. 554. 

At step 554-558, the signals are processed by the Adaptive Interference Filter 
46, the purpose of which is to reduce the unwanted signals. The filter 46, at 
step 554 is instructed to perform adaptive filtering on the non-target signals 
15. with the intention of adapting the filter coefficients to reducing the unwanted 
signal in the Sum channel to some small error value e c1 . This computed e c i is 
also fed back to step 554 to prevent signal cancellation cause by wrong 
updating of filter coefficients. _ 

20 In the alternative, at step 556, the target signals are fed to the filter 46 but this 
time, no adaptive filtering takes place, so the Sum and Difference signals 
pass through the filter. 

The output signals from processor 46 are thus the Sum channel signal S c i and 
25 filtered Differencejsignal Sj. 

Following point D, the signals will pass through few test conditions at step 
560. First, if the signals are A' signals from step 532, the routine passes to 
step 554. Second, if the signals are classified as non-target signal by step 552 
30 (C signal), the routine passes to step 564. Third, the R sdf and P rS d are 
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compared to threshold J J, and T Prsd respectively. If the ratios are both lower 
than threshold, this indicates probable ambient noise signal but if higher, this 
may indicate that there has been some leakage of the target signal into the 
Difference channel, indication the presence of a target signal after all. Lastly, 
5 if the estimated energy E r2 is found exceeds the first threshold T n1 , signals are 
considered to be present. For such signals, the routine also passes to step 
564. For all other ambient noise signals, the routine passes to step 562. 

At step 562-556, the signals are processed by the Adaptive Ambient noise 
1 o Estimation Filter 48, the purpose of which is to reduce the unwanted ambient 
noise. The filter 48, at step 562 is instructed to perform adaptive filtering on 
the ambient noise signals with the intention of adapting the filter coefficients to 
reducing the unwanted ambient noise in the Sum channel to some small error 
value e C 2- 



15 



In the alternative, at step 564, the signals are-fed to the filter 48 but this time, 
no adaptive filtering takes place, so the Sum and Difference signals pass 
through the filter. 

20 The output signals from processor 48 are thus the Sum channel signal S* 
and filtered Difference signal S n . 

At step 568, output signals from processor 46: Sd and Si and output signals 
from processor 48: S* and S-„ are processed by an adaptive signal 
multiplexer. Here, those signals are multiplex and a weighted average error 
signal e.(t). a sum signal S c (t) and a weighted average interference signal l,(t) 
are produced. These signals are then collected for the new N/2 samples and 
the last N/2 samples from the previous block and a Banning Window H n is 
applied to the collected samples as shown in FIG.13 to form vectors S h , l h and 
En. This is an overlapping technique with overlapping vectors S h , lh and E h 



25 



30 
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being formed from past and present blocks of N/2 samples continuously. This 
is illustrated in FIG.14. A Fast Fourier Transform is then performed on the 
vectors S h , l h and E h to transform the vectors into frequency domain 
equivalents S f , If and E { at step 570. 

At step 572, a modified spectrum iscalculated for the transformed signals to 
provide "pseudo" spectrum values P s and Pj. 

In order to reduce signal distortion due to wrong estimation of the noise 
•10 spectra, a frequency scanning is performed between P s and P, to look for the 
peaks in the same frequency components at step 574. Attenuation is. then . 
performed on those peaks in Pi to reduce the signal cancellation effect. P s and 
P- ar- then warned into the same Bark Frequency Scale to prov.de Bark 
Frequency scaled values B E and B, at step 576. At step 578, a voice unvoice 
15 detection is performed on B s and B, to reduce the signal cancellation on the 
unvoice signal. 

A weiohted combination B/of B. (through path F) and B, is then made at step 
580 and this is combined with B, to compute the Bark Scale non-linear gam 

20 G b at step 582. 

G„ is then unwrapped to the normal frequency domain to provide a gain value 
' G at step 584 and this is then used at step 586 to compute an output 
spectrum- S«, using the signal spectrum S, and E, from step 570. This gam- 
25 adjusted spectrum suppresses the interference signals, the ambient noise and^ 

system noise. 

An inverse FFT is then performed on the spectrum Soul at step 588 and the 
output signal is then reconstructed from the overlapping signals using the 
30 overlap add procedure at step 590. 
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Hence, besides providing the Speech Recognition Engine 52 with 
a processed signal S, the system also provides a set of useful information 
indicated as ! on Fig. 3. This set of information may include any one or more 
5 of: 

1 . The direction of speech signal, T d (step 528). 

2. Signal Energy, E r i (step 504). 

3. Noise threshold, T n i & T n2 (step 514 and 51 8). 

4. Estimated SINR (signal to interference noise ratio) and SNR 
10 (signal to noise ratio), and R S d (step 548). 

5. Target speech signal presence, A' (steps 530 and 532) 

6. Spectrum of processed speech signal, S ou t (step 586). 

7. Potential speech start and end point. 

8. Interference signal spectrum, If (step 570). 

15 

Major steps in the above described flowchart will now be described in more 
detail. 

Non-Linear Fnemv Estimation (STE PS 504.548) 

20 

At each stage of adaptive filter, the reference signal is taken at a delay half 
the tap-size. Thus, the end of two stages adaptive filter, the signal is delayed 
by Lsu/2 and Luq/2. In order for the decision-making mechanism for the 
different stages to accurately follow these delays, the signal energy 
25 calculations are calculated at 3 junctions, resulting in 3 pairs of the signal 
energy. 

The first signal energy is calculated at no delay and is used by the time delay 
estimation and stagel Adaptive Spatial Filter. 

30 

^=A2 x0 ' )2 - x(, ' +1M/_1) 
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A.1 



The second signal energy is calculated at a delay of half of Adaptive Spatial 
Filter tap-size, Lsu/2. 

5 

The last signal energy is calculated at a delay of Lsu/2 + Luq/2 and is used 

by noise updating. 

. J -(Isv/ 2+ Lug /2)-2 10 

^=7^ S*(0 2 -x(f + l)x0--l) A . 3 

J — 2 i=-(Lw fl+Lvq/2) 

These delays are implemented by means of buffering. 

15 Threshold Estip -^inn and Updating. (STEPS 514.518) 

The processor 42 estimates two thresholds T n1 and T r , 2 based on a statistical 
approach. Two sets of histogram, referred to as Pb and Pbs, are computed in 
the same way, except that Pbs is computed every block of N/2 samples and 
20 Pb is computed only on the first 20 blocks of N/2 samples or when £ rt < /„, 
which means that there is neither a target signal nor an interference signal is 
present. E ri is used as the input sample of the histograms, and the length of 
the histograms is a number M (which may for example be 24). Each 
histogram is as found from the following equation: 

25 

H, = aH; + (1 - a)5(i - BA 
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Where H, stands for either of Pb and Pb, and has the form: 

~h(l) 



h{2) 
h(i) 
h(24) 



B.2 



B.3 



0 ; i D 



B.4 



10 



Thus, a is a forgetting factor. For Pb, a is chosen empirically to be 0.9988 and 
for Pbs, a is equal to 0.9688. 



The value of D which is used in Equation B1 is determined using table 1 
below: Specifically, we find the value of Emax in table 1 which is lowest but 
which is above the input sample E rl , and the corresponding D is used in 
Equation B.1 . Thus, each D labels a corresponding band of values for E rl . For 
15 example, if En is 412, this the band up to Emax= 424, i.e. the range 
corresponding to D=13, and accordingly D=f3 is used in Equation B.1 . Thus, 
if E rl continues to stay at a certain level, say in the band up to Emax(D), the 
weight of the corresponding D value in the histogram will start to build up to 
become the maximum. It indicates that the current running average noise 
20 level is approximately Emax(D). 
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After computing Pb and Pbs, the peak vafees of Pb and Pbs are labelled pp 
5 and PP s respectively, pp is reset to be equal to (pps - 5) if (pps - pp) > 5. 

Below is the pseudo-C which uses pp to estimate T n1 and T n2 : 



Np = Emax[pp]; 
10 Rpp = En /(En + Np); 

gamma = sfun(Rpp, 0, 0.8);. , 
E p = gamma*E p + (1- gamma) En; 

if (E n >= E p )' 

E n = 0J*E n + 0.3*E P ; 

15 else if (En <= Erjold) 

* E n = 0.9995*£,T+ 0.0005*E P ; 
Er_old - En, 

} 

20 e/se 

E n = 0.995*£„ + 0.005*E P ; 

The Emax values in table 1 were chosen experimentally based on a statistical 
method Samoles (in this case, E.) were collected under certain environments 
25 . (office, car, super-market, etc) and a histogram was generated based on the 
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25 



collected samples. From the histogram, a probability density function is 
computed and from there the Emax values were decided. 

Similarly, all the factors in the first order recursive filters and the lower, upper 
limit of the s-function above are chosen empirically. Once the noise energy E„ 
is obtained, the two signal detection thresholds T n1 and T n2 are established as 
follows: 

Tm = Si E n B.5 
T n2 = 5 2 £ n B.6 



81 and 6 2 are scalar values that are used to select the thresholds so as to 
optimize signal detection and minimize false signal detection. As shown in 
FIG.5, T n1 should be above the system noise level, with T n2 sufficient to be 
1 5 generally breached by the potential target signal. These factors may be found 
by trial and error. In this embodiment, 5 t = 1-375 and 5 2 = 1.675 have been 
found to give good results. 

In comparison to the algorithms for setting Tm. and T n2 in PCT/SG99/0Q1 19, 
20 the noise level can be tracked more robustly yet faster. A further motivation 
for the above algorithm for finding the thresholds is to distinguish between 
signal and noise in all environments, especially noisy environments (car, 
supermarket, etc.). This means that the user can use the embodiment any 
where. 



Time Delay Estimation (T*Y(S I EP 528) 



FIG 6A illustrates a single wave front impinging on the sensor array. The wave 
front impinges on sensor 10d first (A as shown) and at a latertime impinges on 
30 sensor 10a (A' as shown), after a time delay U This is because the signal 
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originates at an angle of 40 degrees from the boresight direction. If the signal 
originated from the boresight direction, the time delay t, will have been zero 
ideally. 

T.me delay estimation of performed using a tapped delay line time delay 
estimator included in the processor 42 which is shown in Fig. 6B. The filter has a 
delay element 600, having a delay Z", connected to the reference channel 10a 
and a tapped delay line filter 610 having a filter coefficient W td connected to 
channel lOd. Delay element 600 provides a delay equal to half of that of the 
tapped delay line filter 610. The outputs from the delay element is d(k) and from 
filter 610 is d'(k). The Difference of these outputs is taken at element 620 
providing an error signal e(k) (where k is a time, index used for ease of 
illustration). The error is fed back to the filter 610. The Least Mean Squares 



15 



(LMS) algorithm is used to adapt the filter coefficient W« as follows: 



SlOtl'Ov ~ 



.B.2 



.B.3 
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_ Pld D A 

Mu1 = t. — ....B.6 



5 where p td is a user selected convergence factor 0<fa<2, I I denoted the norm of 
a vector, k is a time index, U is the filter length. 

e(k) = d®-d'(k) B4 



' d'(k) = W,t(k) T -S,o d (k) B.5 

The impulse response of the tapped- delay line filter 620 at the end of the 
adaptation is shown in Fig. 6c. The impulse response is measured and the 

10 position of the peak or the maximum value of the impulse response relative to 
origin O gives the time delay T d betweenthe two sensors which is also the angle 
of arrival of the signal. In the case shown! the peak lies at the centre indicating 
that the signal comes from the boresight direction (T„=0). The threshold 9 at 
step 506 is selected depending upon the assumed possible degree of departure 

15 from the boresight direction from which the" target signal might come. In this 
embodiment, 8 is equivalent to ■ 
±15°. 



20 



WO 03/036614 



PCT/SGO2/O01-49 



21 



Normalized Cross Correlation Estimation Cv ( STEP 528) 

The normalized crosscorrelation between the reference channel 10a and the 
most distant channel 10d is calculated as follows: 

Samples of the signals from the reference channel 10a and channel 10d are 
buffered into shift registers X and Y where X is of length J samples and Y is of 
length K samples, where J>K, to form two independent vectors Xr and Y r : 



x r (2) 



AV 



.C.i 



y r 



Yr* 



...C.2 



15 



y r (K)\ 



A time delay between the signals is assumed, and to capture this Difference, J is 
made greater than K. The Difference is selected based on angle of interest. The 
normalized cross-correlation is then calculated as follows: 
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...C.3 



Where. .. X r r 



Xr 

Xrfl + V 



...C.4 



\^h ere t re p rese nts the transpose of the vector and | | represent the norm of 
the vector and I is the correlation lag. ! is selected to span the delay of interest. 
5 For a sampling frequency of 16kHz and a spacing between sensors 10a, 10d of 
1 8cm, the lag I is selected to be five samples for an angle of interest of 1 5°. 

The threshold T c is determined empirically. T c = 0.65 is used in this 
embodiment. 



10 



Block Leak compensation LMS for Time Delay Estimation (STEP 525) 



In the time delay estimation LMS algorithm, a modified leak compensation 
form is used. This is simply implemented by: 



15 



W t( j =aW t d 



(where cc=forgetting_factor -=0.98) 



Thistleak compensation form has the property of adapting faster to the 
direction of fast changing sources and environment 



WO 03/036614 



PCT/SG02/001-19 



23 



Filtpr r.opffident Peak Ratio. Pv (STE P 528) 

The impulse response of the tapped delay line filter with filter coefficients W u 
5 at the end of the adaptation with the present of both signal and interference 
sources is shown in FIG.7. The filter coefficient W u is as follows: 



10 



With the present of both signal and interference sources, there will be more 
than one peak at the tapped delay line filter coefficient. The P k ratio is 
15 calculated as follows: 



A = MaxW, r d 



10 L0 . 

where . — — A<«<— + A 
2 2 



20 



B = MaxW" d 



"where 0 < n < 



L0 



-A. — + A <n 

' 7 



i+5 



A is calculated base on the threshold 6 at step 530. In this embodiment, with 9 
25 equal to ±15°, A is equivalent to 2. A low P k ratio indicates the present of 
strong interference signals over the target signal and a high P k ratio shows 
high target signal to interference ratio. 
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Adaptive Spatial Filter 44 (STEPS 532-536) 

FIG.8 shows a block diagram of the Adaptive Linear Spatial Filter 44. The 
function of the filter is to separate the coupled target interference and noise 

5 signals into two types. The first, in a single output channel termed the Sum 
Channel, is an enhanced target signal having weakened interference and 
noise i.e. signals not from the target signal direction. The second, in the 
remaining channels termed Difference Channels, which in the four channel 
case comprise three separate outputs, aims to comprise interference and 

10 noise signals alone. 

The objective is to adopt the filter coefficients of filter 44 in such a way so as 
to enhanced the target signal and output it in the Sum Channel and at the 
same time eliminate the target signal from the coupled signals and output 
15 them into the Difference Channels. 

The adaptive filter elements in filter 44 acts as linear spatial prediction filters 
that predict the signal in the reference channel whenever the target signal is 
present. The filter stops adapting when the signal is deemed to be absent. 

20 

The filter coefficients are updated whenever the conditions of steps are met, 
namely: 

i. The adaptive threshold detector detects the presence of signal; 
25 ii. The peak ratio exceeds a certain threshold; 
iii. The running R S d exceeds a certain threshold; 

As illustrate in FIG.8, the digitized coupled signal X 0 from sensor 10a is fed 
through a digital delay element 710 of delay Z^ 12 . Digitized coupled signals 
30 Xl X 2 , X 3 from sensors'lOb, 10c, 10d are fed to respective filter elements 
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712,4,6. The outputs from elements 710,2,4,6 are summed at Summing 
element 718, the output from the Summing element 718 being divided by four 
at the divider element 719 to form the Sum channel output signal. The output 
from delay element 710 is also subtracted from the outputs of the filters 
5 712,4,6 at respective Difference elements 720,2,4, the output from each 
Difference element forming a respective Difference channel output signal, 
which is also fed back to the respective filter 712,4,6. The function of the 
delay element 710 is to time align the signal from the reference channel 10a 
with the output from the filters 712,4,5. 



The filter elements 712,4,6 adapt in parallel using the normalized LMS 
algorithm given by Equations E.1...E.8 below; the output of the Sum Channel 
being given by equation E.1 and the output from each Difference Channel 
being given by equation E.6: 



10 



E.1 



4 



Where: 




E.2 



5. (*) = (P; (*))'*.(*> 



E.3 



20 



Where m is 0,1,2.. .M-1, the number of channels, in this case 



i 



denotes the transpose of a vector. 



WO 03/0366JP W PCT/SG02/00149 



26 



X m {k) = 



E.4 



E.5 



Where X m (k) and W su m (k) are column vectors of dimension (Lsu x 1). 



The weight X m (k) is updated using the normalized LMS algorithm -as follows: 



10 



E.6 



K (k + 1) = *C W + 2/£ jr. (*) 



E.7 



15 



Where: 



iw r sv 



E.8 



and where (5 SU is a user selected convergence factor 0 < p su ^ 2, || || denoted 
the norm of a vector and k is a time index. 



20 



WO03AB661-— — PCT/SG02/00149 



27 



. Running FU* within Adaptive Spatial Filt er (STEP 532) 

To prevent filter coefficients being updated wrongly, conditions for updating a 
block of N/2 samples is insufficient. Running R sd is computed every N/2 
5 samples and it's being used with other conditions to test whether that 
particular sample should update or not. 

Running R sc s is calculated as follows: 

10 

Where: 

EZ = 0.9ZEZ+ 0.02(abs[(i (k + 1)) 2 - S e (k) S e (k + 2)]) F. 1 0 

15 E~ = 0.9ZEZ + 0.02(o6j[(S ( , {k + 1)) 2 - 3^ {k)d m (k + 2}]) F. 11 

Adaptive Spatial Filter Coefficient Restoration (STEPS 540-546) 

In the events of wrong updating, the coefficients of the filter could adapt to the 
20 wrong direction or sources. To reduce the effect, a set of 'best coefficients' is 
kept and copied to the beam-former coefficients when it is detected to be 
pointing to a wrong direction, after an update. 

Two mechanisms are used for these: 
25 A set of 'best weight' includes all of the three filter coefficients (W su 1 - W su 3 ). 
They are saved based on the following conditions: 
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When there is an update on filler coefficients W su , the calculated ratio is 
compared with the previous stored B Pkl if it is above the BP k , this new set of 
filter coefficients shall become the new set of 'best weight' and current 
ratio is saved as the new BP k . 

A second mechanism is used to decide when the filter coefficients should be 
restored with the saved set of 'best weights'. This is done when filter 
coefficients are updated and the calculated Pa ratio is below BP k and 
threshold T Pk . In this embodiment, the value of T Pk is equal to 0,65. 



10 



Calculation nf Energy Ratio R «h CSTEP 5481 
This is performed as follows: 

S c (0) 
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J=N/2, the number of samples, in this embodiment 256. 

Where E SUM is the sum channel energy and E DIF is the difference channel 



E«»4—rzis t Gf-s t o-VM-» F - 3 



energy. 



^^p^-m-v r. 



5 Tne energy ratio between the Sum Channel and Difference Channel (R sd ) must 
not exceed a predetermined threshold. In the four channel case illustrated here 
the threshold is determined to be about 1.5. 

r.fiirniatinn nf Power Ratio Prsd (STEP 548^ 



10 



This is- performed as follows: 



15 



5.(0) 
5.(1) 



20 
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3,(1) 
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J = N/2, the number of samples, in this embodiment 128. 
Where P S um is the sum channel power and Pdif is the difference channe 
5 power. 



J: 



=0 



10 J w;>o 



r rsc! ~ 



The power ratio between the Sum Channel and Difference Channel must not 
15 exceed a dynamic threshold, Tp re d- 

Calculation of Energy Ratio Factor FU d f fSTEP 548) 

This Energy Ratio Factor R sdf is obtained by passing the R sd to a non-linear S- 
20 shape transfer function as shown in FIG. 9. Certain range of the R sd value can 
be boosted up. or suppressed by changing the shape of .the transfer function 
using different sets of threshold level, Si. and S H . 

Dynamic Noise Power Thres hniH I IpHatinn 1^ (STEP 550) 

25 

This dynamic noise power threshold, T Prsd is updated base on the following 
conditions: 

If the reference channel signal energy is more than 700 and power ratio is 
less than 0.45 for 64 consecutive processing blocks, ^ 

30 Tpnsd = ai* Tprsd + (1--CCi)*Pred 
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Else if the reference channel signal energy is less than 700, then 

Tprsd = a 2 * T PrS(J + (1-a 2 )*Max_Prsd 

In this embodiment, 01 = 0.67, a 2 = 0.98 -and Max_Prsd = 1.3 have been 
found to give good results. 



Adaptive Interference Filter 46 fSTEPS 554-558) 

FIG. 10 shows a schematic block diagram of the Adaptive Interference Filter 
46. This filter adapts to interference signal and subtracts it from the Sum 
10 Channel so as to derive an output with reduced interference noise. 

The filter 46 takes outputs from the Sum and Difference Channels of the filter 
44 and feeds the. Difference Channel Signals in parallel to another set of 
adaptive filter elements 750,2,4 and feed the Sum Channel signal to a 

15 corresponding delay element 756. The outputs from the three filter elements 
750,2,4 are subtracted from the output from delay element 756 at Difference 
element 758 to form and error output e c1 , which is fed back to the filter 
elements 750,2,4. The output from filter 44 is also passed to an Adaptive 
Signal Multiplexer to mix with filter output from filter 48 and subtract it from the 

20 Sum Channel. ■ 

Again, the Least Mean Square algorithm (LMS) is used to adapt the filter 
coefficients W uo as follows: 



5 



e«,(*Mc(*Mi(*) 



(1.1) 



25 



Where 



s,(*)=2>«-W 



and 



(1.2) 
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Y a (k) = 



9 ci M 
5 C M 



W: q (k + 1) = W: (k) + 2^ 7 m (*> el {k) 



u = 



(1.3) 

(1.4) 
(1.5) 



10 and where £, 0 is a user select factor 0<yS„ ? <2 and where m is 0,1,2... .M-1, 
the number of channels, in this case 0..3. 

When only target signal is present 2nd the Interference filter is updated 
wrongly, the error signal in equation 1.1 will be very large and the norm of Y" 
15 will be very small. Hence, by including norm of error signal \e cX \ into weight 
updating p calculation (equation I.5), the p will become very small whenever 
there is a wrong updating of Interference filter occur. This step help to prevent 
a wrong updating of weight coefficients of Interference filter and hence reduce 
the effect of signal cancellation. 



Adaptive Ambient Noise Estimation Filter 48 (S I EPS 562-566) 



FIG.11 shows a schematic block diagram of the Adaptive Ambient Noise 
Estimation Filter 48. This filter adapts to the environment noise and subtracts 
25 it from the Sum Channel so as to derive an output with reduced noise. 



The filter 48 takes outputs from the Sum and Difference Channels of the filter 
44 and feeds the Difference Channel Signals in parallel to another set of 
adaptive filter elemenis 760,2,4 and^feed the Sum Channel signal to a 
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corresponding delay element 766. The outputs from the three filter elements 
760,2,4 are subtracted from the output from delay element 766 at Difference 
element 768 to form and error output e c2 , which is fed back to the filter 
elements 760,2,4. The output from filter 48 also passed to an Adaptive Signal 
Multiplexer to mix with filter output from filter 45 and subtract it from the Sum 
Channel. 



Again, the Least Mean Square algorithm (LMS) is used to adapt the filter 
1 0 coefficients W no as follows: 

. Where: zndd^W^J T"(k) 

ami 



15 



y"(jt)= 



9.*. to 



5 ctnom to. 



20 



w:{k + 1) = wz (*)+ 2//;„r (*> c2 to 



25 



r 4 no 



Y 



and where J* is a user select factor 0<fi no <2 and where m is 0,1,2....M-1, 
the number of channels, in this case 0...3. 
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Adaptive Signal Multiplexer (STEP 568) 

FIG. 12 shows a schematic block diagram of the Adaptive Signal Multiplexer. 
This multiplexer adaptively multiplex the output from interference filter 46 S } 
5 and ambient noise filter 4B'S n to produce two interference signals l c and l s as 
follows: 



The weights (W e1 , W e 2) and (Wm, W n2 ) can be changed base on different 
15 input signal environment conditions to minimize signal cancellation or improve 
unwanted signal suppression. In this embodiment, the weights are determined 
base on the following conditions: 

If target signal is detected and updating condition for filter 46 (552) and filter 
20 48 (560) are false then W e i = 0, W e2 = 1 .0, W n1 =0.8 and W n2 =1.0. . 

Else if no target signal is detected and updating condition for filter 46 (552) is 

true then W e1 = 1.0, W e2 = 1.0, W n i =1.0 and W n2 = 1.0. 

Else if no target signal is detected and updating condition for filter 46 (552) is 

false and updating condition for filter 48 (560) is true then W e1 = 0, W e2 = 1.0, 
25 W„*=1.0 and W n2 = 1-0. 

I c is subtracted from the Sum Channel S c so as to derive an output e s with 
reduced noise and interference. This output e s is almost interference and 
noise free in an ideal situation. However, in a realistic situation, this cannot be 
30 achieved. This will cause signal cancellation that degrades the target signal 
quality or noise or interference will feed through and this will lead to 
degradation of the output signal to noise and interference ratio. The signal 
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cancellation problem is reduced in the described embodiment by use of the 
Adaptive Spatial Filter 44, which reduces the target signal leakage into the 
Difference Channel. However, in cases where the signal to noise and 
interference is very high, some target signal may still leak into these channels. 

To further reduce the target signal cancellation problem and unwanted signal 
feed through to the output, the other output signal from Adaptive Signal 
Multiplexer l s is fed into the Adaptive Non-Linear Interference and Noise 
Suppression Processor 50. 



Adaptive Non-Linear Interference and Noise Suppression Processor 50 
(STEPS, 570-590) 

This processor processes input signals in the frequency domain coupled with 
15 the well-known overlap add block-processing technique. 

S c (t), e s (t) and l s (t) is buffered into a memory -as illustrated in FIG. 13. The 
' buffer consists of N/2 of new samples and N/2 of old samples from the 
previous block. - 



A Manning Window is then applied to the N samples buffered signals as 
illustrated in FIG. 14 expressed mathematically as follows: 



S ff (f + 1) ' 
S c (r + 2) 

S c (t+N) 



(H.3) 



25 
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'«,(' + !) 



(H.4) 



'/,(' + » 

/,</ + 2) 



(H.5) 



5 Where (H n ) is a Hanning Window of dimension N, N being the dimension of 
the buffer. The "dot" denotes point-by-point multiplication of the vectors. T is a 
time index. 

The resultant vectors [S h ], [Eh] and [l h ] are transformed into the frequency 
domain using Fast Fourier Transform algorithm as illustrated in equation H.6, 
10 H.7 and H.8 below: 



S f =FFT(S>) 



(H.6) 



15 



E f = FFT(E h ) 
Ij=FFT(h) 



(H.7) 
(H.8) 



A modified spectrum is then calculated, which is illustrated in Equations H.9 
and H. 10: 



20 



(H.9) 



^-iReC/^+llmC/^ + FC/,)^, 



(H.10) 
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Where "Re" and "Im" refer to taking the absolute values of the real and 
imaginary parts, r s and n are scalars and F(Sf) and F(lf) denotes a function of 
Sf and if respectively. 

5 

One preferred function F using a power function is shown below in equation 
H.1 1 and H.12 where "Conj" denotes the complex conjugate: 

P^lmS/^lmiS^+iS^coryXSj))*^ (H.11) 

10 

f> =|R e (/ y )|-r|lm(7 / )|x(/ / *corq{Ij))*r, (H.12) 

A second preferred function F using a multiplication function is shown below 
in equations H.13 and H.14: 

15 

P s =|Re( 1 S / )| + |lm(5 / )|+|Re( 1 ? / )|*|lm(5 / )|*r, (H.13) 
i> =|Re(/ / )|+|lm(/ / )| + |Re(/ / )|*|lm(/ / )|*/- / (H.14) 

20 The values of the scalars (r s and n) control the tradeoff between unwanted 
signal suppression and signal distortion and may be determined empirically. 
(r» and n) are calculated as 1/(2 VS ) and 1/(2 vi ) where vs and vi are scalars. In 
this embodiment, vs=vi is chosen as 8 giving r s = n = 1/256. As vs, vi reduce, 
the amount of suppression will increase. 

25 

» 

Frequency Scan for similar peak between P ; and Pi, 

Pi may contain some of the frequency components of P s due to the wrong 

estimation of Pj. Therefore, frequency scanning is applied to both P s and Pi to 
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look for the peaks in the same frequency components. For those peaks in Pi is 
then multiplied by an attenuation factor which is chosen to be 0.1 in this case. 

The Spectra (P s ) and (Pi) are warped into (Nb) critical bands using the Bark 
5 Frequency Scale [See Lawrence Rabiner and Bing Hwang Juang, 
Fundamental of Speech Recognition, Prentice Hall 1993]. The number of Bark 
critical bands depends on the sampling frequency used. For a sampling of 
16kHz, there will be Nb = 22 critical bands. The warped Bark Spectrum of (P s ) 
and (Pj) are denoted as (B s ) and (Bj). 
10 • 
Voice Unvoiced Detection and Amplification 

This is used to detect voice or unvoiced signal from the Bark critical bands of 
sum signal and hence reduce the effect of signal cancellation on the unvoiced 
signal. It is performed as follows: 



B,(Nb)_ 



k 



where k is the voice band upper cutoff 



Nb 



where / is the unvoiced band lower cutoff 



Unvoice Ratio = 



sum 

r 

sum 
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If Unvoice JLatio > UnvoiceJTh 
B,{") = BAn)xA 

5 where l<n<Nb 

In this embodiment, the value of voice band upper cutoff k, unvoiced band 
lower cutoff /, unvoiced threshold UnvoiceJTh and amplification factor A is 
equal to 16, 18, 10 and 8 respectively. 

10 A Bark Spectrum of the system noise and environment noise is similarly 
computed and is denoted as (B n ). B n is first established during system 
initialization as B n = B s and continues to be updated when no target signal is 
detected (step) by the system i.e. any silence period. B n is updated as follows: 

15 if ((E r3 < T„i) || (loop_cnt < 20)) 
{ . 

if(E r3 <nl1)} 

a = 0.98; 

else 

20 a = 0.90; 

nl1=a*nl1 +(1-a)*Eri; 
B n = a*B n + (1 - a)*B s ; 

> 

25 

Using (B s , B-, and B n ) a non-linear technique is used to estimate a gain (G b ) as 
follows: 

First the unwanted signal Bark Spectrum is combined with the system noise 
50 Bark Spectrum by using as appropriate weighting function as illustrate in 
Equation J.I. 
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(J.1) 



q 1 a nd Q 2 are weights whose can be chosen empirically so as to maximize 
5 unwanted signals and noise suppression with minimized signal distortion. In 
this embodiment, Di = 1.0 and n 2 = 0.25. 



Following that a post signal to noise ratio is calculated using Equation J.2 and 
J.3 below: 



10 



(J.2) 



R pp —Rpo~Im\ 



(J.3) 



The division in equation J.2 means element-by-element division and not 
15 vector division. R p0 and R PP are- column vectors of dimension (Nb xl), Nb 
being the dimension of the Bark Scale Critical Frequency Band and l Nbx1 is a 
column unity vector of dimension (Nb x 1) as shown below: 



20 



V( 2 > 
r po (Nb) 

r„(2) 



r pp (Nb) 



(J.4) 



(J. 5) 
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1 Nbx\ " 



(J.6) 



tf any of the r pp elements of R pp are less than zero, they are set equal to zero. 

5 Using the Decision Direct Approach [see Y. Ephraim and D. Malah: Speech 
Enhancement Using Optimal Non-Linear Spectrum Amplitude Estimation; 
Proc. IEEE International Conference Acoustics Speech and Signal Processing 
(Boston) 1983, pp111B-1121.], the a-priori signal to noise ratio R pr is 
calculated as follows: 

10 



(J .7) 



The division in Equation J. 7 means element-by-element division. B 0 is a 
column vector of dimension (Nb x 1) and denotes'the output signal Bark Scale 
15 Bark Spectrum from the previous block B 0 = G b x B s (See Equation J.15) (B 0 
initially is zero); R pr is also a column vector of dimension (Nb x 1). The-value 
of ft is given in Table 2 below: 



i 


1 


2 


3 


4 


5 


Pi 


0.01625 


0.1225 


0.245 


0.49 . 


0.98 



20 



Table 2 

The value i is set equal to 1 on the onset of a signal and p, value is therefore 
equal to 0.01625. Then the i value will count from 1 to 5 on each new block of 
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N/2 samples processed and stay at 5 until the signal is off. The i will start from 
1 again at the next signal onset and the ft is taken accordingly. 

Instead of ft being constant, in this embodiment is made variable and starts 
5 at a small value at the onset of thesignal to prevent suppression of the target 
signal and increases, preferably exponentially, to smooth R pr . 

From this, R rr is calculated as follows: 



10 



2L- 



R 



(J.8) 



The division in Equation J.8 is again element-by-element. R„ is a column 
vector of dimension (Nb x 1). 

15 FFom this, L x is calculated: 



4 



(J. 9) 



The value U of is limited to Pi (-3.14). The multiplication is Equation J.9 
20 means eFement-by-element multiplication. U is a column vector of dimension 
(Nb x 1) as shown below: 



4 = 



/,<?) 
l x (nb) 



(J.10) 
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A vector L y of dimension (Nb x 1) is then defined as: 



l y (nb) 



(J.11) 



5 Where nb = 1 ,2...Nb. Then L y is given as: 



L(nb) = exp 



(E(nb) 



(J.12) 



and 



10 



„ (/-(»z>)) 2 Q A"b)Y q..("Q)) 4 

-0.57722 - log(Z, (nb)) + /, (n&) + g % 



(J.13) 



E(nb) is truncated to the desired accuracy. L y can be obtained using a look-up 
table approach to reduce computational load. - 



15 



Finally, the Gain G b is calculated as follows: 



(J. 14) 



20 The "dot" again implies element-by-elemenf multiplication. G b is a column 
vector of dimension (Nb x 1) as shown: 
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G,= 



'g0) 

g(2) 
r(nb) 



(J.15) 



As G b is still in the Bark Frequency Scale, it is then unwrapped back to the 
normal linear frequency scale of N dimensions. The unwrapped G b is denoted 
5 as G. 

The output spectrum with unwanted signal suppression is given as: 



S' = (l-Rsdf).G'S f +Rsdf.E f 



(J.16) 



10 



15 



The again implies element-by-element multiplication. In eqn J.16 if R S df is 
high (implying high signal energy to interference energy) the output signal 
spectrum is weighted more from E f than the Noise suppression part (G«S f ) to 
prevent signal cancellation caused by the noise suppression part. 

The recovered time domain signal is given by: 



S^ReilFFTiSj)) 



(J.17) 



20 IFFT denotes an Inverse Fast Fourier Transform, with only the Real part of 
the inverse transform being taken. 

Finally the output time domain signal is obtained by overlap add with the 
previous block of outpui signal: 

25 
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2,0) 




S,(2) 




(J. 18) 


5,(^/2)., 







Where: 



S,. } (2 + N/2) 



(J.19) 



The embodiment described is not to be construed as limitative. For example, 
5 there can be any number of channels from two upwards. Furthermore, as w.ll 
be apparent to one skilled in the art, many steps of the method employed are 
essentially discrete and may be empioyed independently of the other steps or 
in combination with some but not a., of the other steps. For example the 
. adaptive filtering and the frequency domain - processing may be perrormed 
10 independently of each other and the frequency domain processing steps sucn 
as th* use of the modified spectrum, warping into the Bark scale and use of 
the scaling_factor p, can be viewed as a series of independent tools which 
need not all be used together. 

15 Turnino now to Figs. 16 and 17, an embodiment of the invention is shown 
which is a headset system. As shown schematically in Fig. 16, the system has 
two units, namely a base station 71 and a mobile unit 72. 

.. The base unit provides connection to any host system 73 (such as a PC) 
20 through a USB (universal serial bus). It acts as a router for steaming aua,o 
information between the host system and the mobile unit 72. It is formed with 
a cradle (not shown) for receiving and holding the mobile unit 72. . he cradle 
- . is preferably provided with a charging unit co-operaBng with a rechargeable 
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20 



power source which is part of the mobile unit 72. The charging unit charges 
the power source while the mobile unit 72 is held by the cradle. 

The base unit 71 includes at least one aerial 74 for two-way wireless 
communication with at least one aerial 75 of the mobile unit 72. The mobile 
unit includes a loadspeaker 76 (shown physically connected to the mobile unit 
72 by a wire, though as explained below, this is not necessary), and at least 
two microphones (audio sensors) 77. The wireless link between mobile unit 
72 and base station 71 is a highly secure RF Bluetooth link. 



• Fig. 17 shows the mobile unit 72 in more detail, it has a structure defining an 
open loop 78 to be placed around the head or neck of a user, for example so 
as to be supported on the user's -shoulders. At the two ends of the loop are 
multiple microphones 77 (normally 2 or 4 in total), to be placed in proximity of 
15 the user's mouth for receiving voice input. One of more batteries 79 may be 
provided near the microphones 76. In this case there are two antennas 75 
embedded in the structure. Away from the antennas, the loop 78 is covered 
with RF absorbing material. A rear portion 80 of the loop is a flex-circuit 
containing digital signal processing and RF circuitry. 



The system further includes an ear speaker (not shown) magnetically coupled 
to the mobile unit 72 by components (not shown) provided on the mobile unit 
72. The user wears the ear speaker in one of his ears, and it allows audio 
output from the host system 73. This enables two-way communication 
25 applications, such as internet telephony and other speech and audio 
applications. 

Preferably, the system includes digital circuitry carrying out a method 
according to the invention, on audio _ signals received by the multiple 
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microphones 76. Some or all of the circuitry can be within the circuitry 80 
and/or within the base unit 71 . 

Figures 18(a) and 18(b) show two ways in which a user can wear the mobile 
unit 72 having the shape illustrated in Fig. 17. In Fig. 18(a) the user wears the 
mobile unit 72 resting on the top of his head with the microphones close to his 
mouth, in Fig.18(b) the user.has chosen to wear the mobile unit 72 supported 
by his shoulders and with the two arms of the loop embracing his neck, again 
with the microphone close to his mouth. 



Use of first, second etc. in the claims should only be construed as a means of 
identification of the integers of the claims, not of process step order. Any 
novel feature or combination of features disclosed is to.be taken as forming 
an independent invention whether or not specifically claimed in the appendant 
1 5 claims of this application as initially filed. 
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Claims 

1 . A headset system including a base unit and a headset unit to be worn 
by a user and having a plurality of microphones, the headset unit and base . 

5 unit being in mutual wireless communication, and at least one of the base unit 
and the headset unit having digital signal processing means arranged to . 
perform signal processing in the time domain on audio signals generated by 
the microphones, the digital signal processing means including-at least one 
adaptive filter to enhance a wanted signal in the audio signals and at least 

10 one adaptive filter to reduce an unwanted signal in the audio signals. 

2. A headset system according to claim 1 in which the base unit includes 
a cradle for holding the headset unit. 

15 3. A headset system according to claim 1 or claim 2 in which the headset 
unit is associated with a loudspeaker operable by the headset unit for 
generating audio signals to the user. 

4. A headset system according to claim 1 in which the digital signal 

20 processing means includes: 

a first adaptive filter arranged to enhance a target signal in the digital 

signals, and 

a second adaptive filter and to a third adaptive filter each receiving the 
output of the first adaptive filter, 



25 



the second filter being arranged to suppress unwanted interference 
signals, and the third filter being arranged to suppress noise signals. 

5 A headset system according to claim 4 which the digital processing 
means is adapted to combine the outputs of the second and third adaptive 
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filters, convert to the frequency domain and perform further processing in the 
frequency domain. 

6. A headset system according to claim 5 in which an output Sj(t) of the 
5 second filter and an output S n (t) of the third filter are linearly combined using • 
weighting factors to derive two interference signals, a first of the interference 
signals Ic being subtracted from the output of the first filter, and a second of 
the interference signals Is being converted into the frequency domain. 

10 7. A headset system according to any of claims 4 to 6 in which the 

second and third filter are not adapted if it is determined that a target signal is 
present. 

8. A headset system according to any of claims 4 to 7 in which the 

1 5 second filter is not updated if it is determined that an interference signal is not 
present. 

9. A headset system according to claim 7 or claim 8 further comprising 
the siep of at intervals determining signal energy, and deriving at least one ■ 

20 noise threshold from a plurality of values of the signal energy, said 

determination including determining whether a further signal energy is above 
the noise threshold. 

1 0. A headset system according to claim 9 in which the derivation of said 
25 noise threshold includes using the plurality of signal energy to derive a 

histogram representing the statistical frequencies of signal energy values in 
each of a number of bands, and deriving the noise threshold from a signal 
energy value Emax associated with the band having the highest histogram 
value. 
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11. A headset system according to any.of claims 4 to 1 0 in which the digital 
signal processing means comprises a fourth adaptive filter for determining the 
direction of arrival of the target signal. 

5 12. A headset system according to claim41 in which the weights of the 
fourth adaptive filter are updated including repeatedly performing an update 
process which attenuates each existing weight value by a forgetting factor a. 

13. A headset system according to claim 1 1 or 12, in which the digital 
10 signal processing means is adapted to determine a ratio P k indicating the ratio 
of the highest central weight value A of the fourth adaptive filter to the sum of 
A and the highest peripheral weight value B t the digital signal processing 
means only adapting the first filter if the ratio P k is above a given value T Pk i. 

15 14. A headset system according to claim 1 3 in which, following an 

adaptation of the first filter, the digital signal processing means calculates a 
new value of the ratio, determines whether the value of P^z is below the 
previous maximum value of and below a- threshold Tpk, and if so restores 
at least one of the first, second and third filters to its previous state. 

20 

15. A headset system according to claim 13 or claim 14 when dependent ' 
on claim 8 in which the determination that an interference signal is not present 
includes a determination that the value of said ratio is below a threshold Tp^, 

25 1 6. A headset system according to any of claims 4 to 1 5 in which the 
weights of the second filter are adapted by a weight/updating factor p. which 
varies inversely with an error output e c i of the second filter. 

17. A headset system according to claim 5 in which the combined signals 
30 are transformed into two frequency donftain signals which are a desired signal 
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Si and an interference signal I, , S { and i, are transformed into respective 
modified spectra P. and Pi, the modified spectra are warped into respective 
Bark spectra Bs and Bi. • 

5 18. A headset system according to claim 17 in which, prior to said warping, 
frequency scanning is applied to the modified spectra Ps and Pi, and peaks 
which are found to be common to both are attenuated in Pi. 

19. A headset system according to claim 17 or claim 18 in which a ratio is 
1 o derived of the sum of the values of B E over the Bark critical bands up to the 
voice band upper cutoff, and the sum of the values of B s over the Bark critical 
bands at and above the unvoiced bank lower cutoff. . 

20 A headset system according to claim 1 6 in which the ratio is above a 
1 5 given threshold, the values of B s above the unvoiced band lower threshold are 

amplified. 

21 . A headset system according to any preceding claim further including 
a speech recognition engine receiving the output of the digital signal 

20 processing means. 

22. A headset system according to claim 1 8 in which the speech 
recognition engine receives from the digital signal processing means 
information indicating any one or more of: 

25 a) a direction of a target signal Td, 

b) a signal Energy Er1 , 

c) a noise threshold used by the digital signal processing means, 

d) an estimated SINR (target signal to interference ratio) and SNR 

(target signal to noise ratio), 
30 e) a signal A' indicating the presence of target speech, 
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f) a spectrum of processed speech signal Sout, 

g) potential speech start and end points, and 

h) an interference signal spectrum, If (step 570). 

5 23. A headset system according to any preceding claim in which the 

headset unit comprises two arms for location proximate the mouth of the user 
and for positioning to either side of the user's head. 

24. A headset system according to claim 23 in which the headset is 
10 suitable for positioning supported by the user's shoulders with the arms 

embracing the user's neck.' 

25. A headset system according to claim 23 or claim 24 in which at least 
one microphone is provided on a free end of each of the arms. 



15 



26. A headset unit for use in the headset system of any preceding claim. 



27. A method of processing signals received from an array of sensors 
comprising the steps of sampling and digitising The received signals and 
20 processing the digitally converted signals, the processing including: 

filtering the digital signals using a first adaptive filter arranged to 
enhance a target signal in the digital signals, 

transmitting the output of the first adaptive filter to a second adaptive 
filter and to a third adaptive-fitter, the second filter being arranged to suppress 
25 unwanted interference signals, and the third filter being arranged to suppress 

noise signals; and 

combining the outputs of the second and third filters. 



30 



28. 



Signal processing apparatus arranged to carry out a method according 
to claim 27. 



3/036 W W PCT/SG02/00149 



53 



A microphone headset comprising first and second microphones 
disposed at respective ends of a support, the support being adapted to 
be worn around the neck or head of a user. 
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